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Abstract 

We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly 
to the classical definition of a successively refinable source, we say that a source is strongly successively refinable 
if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both 
decoders. We establish a sufficient condition for strong successive refinability. We show that any discrete source under 
Hamming distortion and the Gaussian source under quadratic distortion are strongly successively refinable. 

We also demonstrate how successive refinement ideas can be used in point-to-point lossy compression problems 
in order to reduce complexity. We give two examples, the binary-Hamming and Gaussian-quadratic cases, in which a 
layered code construction results in a low complexity scheme that attains optimal performance. For example, when the 
number of layers grows with the block length n, we show how to design an algorithm that asymptotically 

achieves the rate-distortion bound. 

Index Terms 

Complexity, layered code, rate-distortion, refined strong covering lemma, source dispersion, strong successive 
refinability, successive refinement. 

I. Introduction 

In the successive refinement problem, an encoder wishes to send a source to two decoders with different target 
distortions. Instead of designing separate coding schemes, the successive refinement encoder uses a code for the 
first decoder which has a weaker link and sends extra information to the second decoder on top of the message 
of the first decoder. In general, the performance of a successive refinement coding scheme is worse than separate 
coding for each decoder. However, for some cases, we can simultaneously achieve the optimum rates for both 

The material in this paper has been presented in part at the 2013 51st Annual Allerton Conference on Communication, Control, and Computing 
(Allerton), and at the 2014 International Symposium on Information Theory. This work was supported by the NSF Center for Science of 
Information under Grant Agreement CCF-0939370. 
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decoders as if the optimum codes were used separately. In this case, we say the source is successively rehnable. 
Necessary and sufficient conditions for successive rehnement were independently proposed by Koshelev m, m 
and Equitz and Cover 0. Rimoldi a found the full rate-distortion region of the successive refinement problem 
including non-successively rehnable sources. Kanlis and Narayan 0 extended the result to the error exponent that 
quantihes “how fast the excess distortion probability decays”. Tuncel 0 characterized the entire region of rate- 
distortion-exponents with separate handling of the two error events. Both lines of work considered error exponents 
in the spirit of Marton a, which characterized the error exponent for the point-to-point case. 

For the point-to-point source coding problem, Ingber and Kochman 0 and Kostina and Verdu 0 independently 
proposed an asymptotic analysis that complements the error exponent analysis. In this setting, the hgure of merit 
is the minimum achievable rate when the excess distortion probability e and the block length n are hxed. This can 
be quantihed by the source dispersion. For an i.i.d. source with law P, the minimum rate can be approximated by 
R{P,D) + ^yV{P, D)/nQ~^{e), where R{P, D) and V{P,D) are, respectively, the rate-distortion function and 
dispersion of a source P at distortion level D. We can consider this rate as a “second order” optimum rate (where 
the classical rate-distortion function is the first order result). 

With this stronger notion of optimality, it is natural to ask whether successive rehnement schemes can achieve 
the second order optimum rates at both decoders simultaneously. An obvious necessary condition for the existence 
of such schemes is that the source be successively rehnable, so we refer to such a source as “strongly successively 
rehnable” (formal dehnitions follow in the sequel). In this paper, we present a second order achievability result for 
the successive rehnement problem. As a corollary, we derive a sufficient condition for strong successive rehnability 
and show that a source P is strongly successively rehnable if all sources P in the neighborhood of P are successively 
rehnable. 

In the second part of the paper, we show that successive rehnement codes can be useful in the point-to-point source 
coding problem when we want to achieve lower encoding complexity. The idea is that hnding the best representing 
codeword in a successive manner is often easier than hnding a codeword from the set of all codewords, which 
normally has exponential complexity. Moreover, storing exponentially many codewords is often prohibitive, while 
successive rehnement encoding can reduce the size of codebooks. Our hndings here contribute to the recent line 
of work on reducing the complexity of rate-distortion codes, cf. cni-iini and references therein. 

We aim to study the general approach of using successive encoding to reduce complexity. We denote this approach 
by “layered coding”, a family that includes all coding schemes that can be implemented in a successive manner. 
Basically, the layered coding scheme is searching for an appropriate codeword over a tree structure where the 
number of decoders corresponds to the level of the tree. The larger the tree, the faster the codeword can be found, 
and therefore the lower decoding complexity. In order to reduce the encoding complexity significantly, we generalize 
the result to the case where the number of decoders is increasing with block length n. This is different from the 
classical successive rehnability where only a hxed number of decoders are considered. On the other hand, the larger 
tree structure restricts the class of coding schemes, and therefore too many decoders may cause a rate loss. Our 
result for this setting characterizes an achievable trade-off between encoding complexity (how fast can we hnd the 
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codeword) and performance (how much do we end up compressing). Note that SPARC ifT^ and CROM ifTSll are 
manifestations of the layered coding approach that attain good performance. 

The rest of the paper is organized as follows. In Section |II] we revisit the known results about successive 
refinement and source dispersion. Section HII] provides the problem setting. We present our main results in Section 
IlYl where proof details are given in Section|V] Section[yT]is dedicated to a layered coding scheme, and we conclude 
in Section Ivnl 

Notation: X'^ and X denotes an n-dimensional random vector (Xi, X2,..., X„) while and x denotes a specific 
realization of it. When we have two random vectors, we use the notation such as X” = (Xi 1, Xi 2,..., Xi „) 
and X2 = (X2,l, X2,2j • ■ • > -^2,n)- 


II. Preliminaries 

A. Source Dispersion 

Consider an i.i.d. source X" with law P where the source alphabet is X and the reconstruction alphabet is 
X. Let d : X X X ^ [ 0 , 00) be a distortion measure where d{x'^,x'^) = It is well known 

that the rate-distortion function R{P, D) is the optimal asymptotic compression rate for which distortion D can be 
achieved. However, this first order optimum rate can be achieved only when the block length n goes to infinity. 
Beyond the first order rate, we can consider twcQ asymptotic behaviors which are excess distortion exponent JT) and 
the source dispersion (a, M- The former considers how fast the excess distortion probability Pr d{X'^, X'^) > D 
is decaying, while the latter considers how fast the minimum number of codewords converges to i?(P, D) when 
excess distortion probability e and block length n are given. It was shown that the difference between the minimum 
rate for fixed n and R{P,D) is inversely proportional to square root of n. More formally, let RpD,t{'n) be the 
minimum compression rate for which the excess distortion probability is smaller than e. The result is given by: 

Theorem 1 (EES): Suppose i?(P, D) is twice differentiablj^ with respect to D and the elements of P in some 
neighborhood of {P,D). Then 

"logn 


Rp.nAn) = R{P, D) + W + O 


where V(P, D) is the source dispersion, given by 

V{P,D) ^VAR[i?'(X,L>)] 

= Y,PA){R'A:D))^- 




J2pA)R'A,d) 


.xex 


( 1 ) 

( 2 ) 

( 3 ) 


^ These asymptotic approaches analyze the excess distortion probability. Other approaches exist which analyze the average achievable distortion 

□a, na 

^We say R{P, D) is differentiable at P if there is an extension R{‘, D) : —>■ R which is differentiable. Under this definition, R'{x, D) 

and V{P,D) are well and uniquely defined. Details are given in Appendix IaI 
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and R'{x,D) denotes the derivative of R{P,D) with respect to the probability P{x): 

'dR{Q,Dy 


R'{x,D) 4 


dQ{x) 


(4) 


Q=P 


We have a similar result for the Gaussian source under quadratic distortion: 

Theorem 2 Consider an i.i.d. Gaussian source Jf" distributed according to A/^(0, tr^), and quadratic dis¬ 
tortion, i.e., d(x",ai") = (1/n) ~ XiY- Then 


Rp,DAn-) = 2 ZJ ^ 


logn 


(5) 


Note that the dispersion of the Gaussian source is V{P,D) = 1/2 nats^/source symbol for all D < 


B. Successive Refinement 

The successive refinement problem with two decoders can be formulated as follows. Again, let X" be i.i.d. with 
law P. The encoder sends a pair of messages {mi, m 2 ) where 1 < < Mi for i G {1, 2}. The hrst decoder takes 

mi and reconstructs X"(mi) G where the second decoder takes {mi, m 2 ) and reconstructs X^(mi, m 2 ) G . 
Note that Xi and X 2 denote the respective reconstruction alphabets of the decoders. The i-th decoder employs the 
distortion measure di{-,-) : X x Xi ^ [0,oo) and wants to recover the source cc” with distortion Di, i.e., 

< A for i e {1,2}. (6) 


The rates of the code are dehned as 


Ri =— log Ml 
n 

i?2 =— logMiM2. 


(7) 

( 8 ) 


An {n, i?i, i? 2 , Di,D 2 , e)-successive rehnement code is a coding scheme with block length n and excess distortion 
probability e where rates are (i?i, R 2 ) and target distortions are {Di,D 2 ). Since we have two decoders, the excess 
distortion probability is dehned by Pr > Di for some i . 

Definition 1: A rate-distortion tuple {Ri,R 2 , Hi, ZI 2 ) is achievable, if there is a family of (n, A'A\r!A\ Hi, H 2 , 
£(’^))-successive rehnement codes where 


lim = Ri for j S (1, 2}, 
lim = 0. 


(9) 

( 10 ) 


The achievable rate-distortion region is known: 

Theorem 3 (f^): Consider a discrete memoryless source X" with law P. The rate-distortion tuple (i?i, i? 2 . Hi, H 2 ) 
is achievable if and only if there is a joint law P^ of random variables (X, Xi, X 2 ) (where X is distributed 

according to P) such that 


/(X;Xi) <Ri 


( 11 ) 


/(X;Xi,X2) <i?2 


( 12 ) 
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E 


d^{X,X,) 


<Di for i G {1, 2}. 


(13) 


In some cases, we can achieve the optimum rates at both decoders simultaneously: 

Definition 2: For i G {1, 2}, let Ri{P, Di) denote the rate-distortion function of the source P when the distortion 
measure is di{-, •) and the distortion level is Di. If {Ri{P, Di), R 2 {P, D 2 ), Di, D 2 ) is achievable, then we say the 
source is successively refinable at {Di,D 2 ). Furthermore, if the source is successively refinable at {Di,D 2 ) for all 
non-degenerate Di, D 2 (i.e., for which Ri{P, Di) < R 2 {P, D 2 )), then we say the source is successively refinable. 
A necessary and sufficient condition for successive refinability is known. 

Theorem 4 A source P is successively refinable at {Di,D 2 ) if and only if there exists a conditional 

distribution Pj^^ X 2 \x ^ ~ "^2 — Xi forms a Markov chain and 


RfiP,D,) = I{X-Xi) 


(14) 


E 


dfiX,Xi) 


< D, 


(15) 


for z G {1, 2}. 

The condition in the theorem holds for the cases of a Gaussian source under quadratic distortion and for any 
discrete memoryless sources under Hamming distortion. Note that the successive refinability is not shared by 
all sources and distortion measures. For instance, symmetric Gaussian mixtures under quadratic distortion are not 
successively refinable ini. The above results of successive refinability can be generalized to the case of k decoders. 

Note that we can also define successive refinability using R{P, Di, D 2 ) where R{P, Di, D 2 ) is the minimum 
rate R 2 such that {Ri{P, Di), R 2 , Di, D 2 ) is achievable. Using Theorem^ we can characterize R{P, Di, D 2 ), 


R{PDi,D 2 )= inf IiX;Xi,X 2 ). (16) 

E[di(X,Xi)]<ni. 

Pxi,X2lX- Eld2{X,X2)]<D2, 

IiX-,Xi)<RiiP,Di) 

Definition |2] implies that the source is successively refinable at {Di, D 2 ) if and only if R{P, Di, D 2 ) = R 2 {P, D 2 ). 


III. Problem Setting 

We consider the successive refinement problem with two decoders. Let X" = (Xi, • • • , X„) be i.i.d. with law 
P, where the source alphabet is X. An encoder = (fi^\ /^"^) maps a source sequence to a pair of messages. 


^(n) 

:X"^{1,-- 

• ,M,} 

(17) 

n{n) 

J2 

: X” ^ {I,-- 

■ ,M2}. 

(18) 


The first decoder receives only the output of /}"^(X"), and therefore we say that its rate is Ri = (1/n) log Mi. 
The second decoder receives the output of both functions, so its rate is R 2 = (1/n) log Mi M 2 . 

Decoder 1 employs a decoder : {1, • • • , Mi} —A" and decoder 2 employs a decoder : {1, • • • , Mij x 
{1, • • • , M 2 } — X 2 , where Xi and X 2 are the reconstruction alphabets for each decoder. Decoder i has its own 
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distortion measure di : x ^ [0, oo) with a target distortion Di. Both di and d2 are symbol by symbol 

distortion measures which induce block distortion measures by 


1 

di{x^,x^) = - di{xj,Xij) 

” i=i 

for all i G { 1 , 2 }, x" G x" € X^ and X2 G The setting is described in Figure [ 1 ] 


( 19 ) 


X” 



Fig. 1. Successive Refinement 


Definition 3 : We say that (n, Mi, M2, £>1,-D2, ei, £2) is achievable if there exists an encoder-decoder pair that 
satisfies 


Pr 




<ei 


Pr > D2] <£2, 


( 20 ) 

( 21 ) 


and such a code is called a {n, Mi, M2, Di, D2, ei, e2)-code. 

Note that we consider the two error events separately, unlike in the definition of a successive refinement code 
in Section Hl-BI Our goal is to characterize the achievable (n. Mi, M2, Di,D2, £1, £2) region in general. Motivated 
by successive refinability, we define strong successive refinability as follows. 

Definition 4 : The source is strongly successively refinable at (£)i, £>2, £1, £2) if (n. Mi, M2, £>1, 02 , £1, £2) is 
achievable for some Mi, M2 satisfying 

AogMi = Ri{P,Di) + \^^^Q-\ei)+o{-^ (22) 

n \ n \v^/ 

-\0gM1M2 = R2{P,D2) + \^^^^Q-\e2) + o (^) ( 23 ) 

n \ n \ / 

where Ri{P,Di) and Vi{P,Di) are the point-to-point rate-distortion function and the source dispersion for the 

£th decoder. Furthermore, if the source is strongly successively refinable at (Oi, D2, £1, £2) for all non-degenerate 

Di, D2, ei, £2 (i-S-, Rp,Di,e{n) < Rp^D2,e{n)), then we say the source is strongly successively refinable. 

While standard successive refinability implies that the successive refinement structure does not cause any loss 
in the compression rate (asymptotically), strong successive refinability implies that we also do not lose from the 
dispersion point of view. 

Note that in order to verify that a source is strongly successively refinable, it is sufficient to find an achievability 
scheme since the converse will follow from the converse in point-to-point source coding. 
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IV. Main Results 

Our results in this section pertain to discrete memoryless sources under general distortion, as well as Gaussian 
sources under quadratic distortion. The results are given here, with proofs in Section |V] 


A. Discrete Memoryless Source 

Let be i.i.d. with distribution P and the distortion measures he di : X x ^ [ 0 ,oo) and d2 '■ X x X2 ^ 
[ 0 ,c»). We assume that the alphabets X, Xi and X2 are finite, and therefore distortion measures di and d2 are 
bounded by some constant dm- We further assume that P{x) > 0 for all x G X since one can remove the source 
symbol from X that has zero probability. Then, the following theorem provides the achievable rates including the 
second order term: 

Theorem 5 (Achievability for Discrete Memoryless Source): Assume that both Ri{P, Di) and R{P, Di, D2) are 
continuously twice differentiable with respect to Di, and the elements of P in some neighborhood of {P, Di , D2) 
Then, there exists an (n, Mi, M2, Di, £>2, ei, e2)-code such that 

"logn 


- log Ml = fli (P, jJi) + W ^ (ei) + O 

n \ n 


-logMiM2 =P(P,Pl,P2) + V 

n \ n \ n 


where 


Vi(P,Pi) ^VAR[P;(X,Pi)] 

= ^P(x)(P'i(a:,Pi)f- 

F(P,Pi,P2) =VAR[P'(X,Pi,P2)] 

= ^P(x)(P'(x,Pi,P2)f- 


^P(x)P'i(x,Pi) 






^P(x)P'(x,Pl,P2) 


.a:GA 


( 24 ) 

( 25 ) 

( 26 ) 

( 27 ) 

( 28 ) 
( 29 ) 


Similarly to Theorem [T] P'i(x, Pi) is the derivative of Ri{P,Di) with respect to the probability P(x) and 
P'(x, Pi, P2) is the derivativj^ of P(P, Pi, P2) with respect to the probability P(x): 

MPi(Q,Pi 


i?'i(a;,Pi) = 
P'(x,Pi,P2) = 


dQ{x) 

aP(Q, Pi, P2 


Q=P 


dQ{x) 


( 30 ) 

(31) 


JQ=P 


By applying the above theorem to the special case where R{P, Di, D2) = R2{P,D2) for all P in some 
neighborhood of P, we get the following corollary. 


^Similar to the definition of R'{x, D), we can define R'{x, Di, D 2 ) using an extension. Then, R'[x^ D 2 ) and V(P, Di, D 2 ) are well 

and uniquely-defined as well, where details are given in Appendix |a] 


















Corollary 6: Suppose Ri{P,Di) is continuously twice differentiable with respect to Di and the elements of P 
in some neighborhood of {P,Di) for i G { 1 , 2 }. If all sources P in some neighborhood of P are successively 
refinable at Di,D2, then there exists an {n, Mi, M2, Di, D2, ei, 62) code such that 
1 


-logM. = R,IP,D,) + + o('^ 

n V n \ n 


1 


log Ml M2 = R2{P, D2) 




( 32 ) 

( 33 ) 


n \ n 

i.e., the source P is strongly successively refinable at (Hi, II2, ei, £2)- 

The corollary is because R{P, Di, D2) = i?2(P, H2) for all P in some neighborhood of P implies that their 
derivatives at {P, Di, D2) coincide, i.e., 


pi?2(g,H2)i 


ldRiQ,Di,D2)) 

dQ{x) 

Q=P 

dQ{x) 


( 34 ) 


JQ=P 


Since the source dispersion is the variance of the derivatives, we have V{P, Di, D2) = V2{P, H2). 

Remark 1 : Any discrete memory less source under Hamming distortion measure is successively refinable. There¬ 
fore, Corollary | 6 ] implies that any discrete memoryless source under Hamming distortion is also strongly successively 
refinable provided R{P, D) is appropriately differentiable. Note that the size of the set {D : R{P, D) is not 
differentiable} is at most \X\ ifTSl . 


B. Gaussian Memoryless Source 

Let X" be an i.i.d. AA( 0 , cr^) source, and suppose the distortion measure is quadratic (at both decoders). 
Theorem 7 (Achievability for Gaussian Memoryless Source): The memory less Gaussian source under quadratic 
distortion is strongly successively refinable, i.e., for > Di > D2, there exists an {n. Mi, M2, Di, 02,61,62) 
code such that 


- logMi = i log ^ J^Q-i(ei) + O ” 
n 2 Di V 2n 


-logMiM 2 = ilog-^ + ^{ 62 ) + 0 

n 2 D2 V 2 n 


( 35 ) 

( 36 ) 


V. PROOF 


A. Method of Types 

Our proofs for finite alphabet sources rely heavily on the method of types HD. In this section, we briefly review 
its notation and results that we use. Without loss of generality we assume X = {1,2,... ,rx}. For any sequence 
cc" G X^, let N{a\x'^) be the number of symbol a G X In the sequence cc". Let the type of a sequence x" be an 
dimensional vector P^p = (X(l|x"') /n,N{2\x'^)/n,..., N{rj;\x^)/n). Then, denote Vn{X) be the set of all types 
on X", i.e., Pn{X) = {Px^ \ x" G X"}. The size of the set Pn{X) is at most polynomial in n, more precisely. 


\rn{X)\ < (n + l)’’L 


(37) 
















9 


For given type P, define type class of P by 

Tp = {x^ G A”" I P,n = P}. (38) 

We can also define type class Tx'^ = {i" G <F" | Pjn = P^jn} using a sequence G <F". We can bound the size 
of type class. 

——exp(nP(P)) < |Tp| < exp(nP(P)) ( 39 ) 

(n + 1 )’^“' 

where H{P) denote an entropy of random variable with law P. 

We further consider the conditional types. Let be a set of alphabet where we also assume y = { 1 , 2 ,..., Cy} 
to be hnite. Consider a stochastic kernel W : X ^ y. We say that y” G 3 ^" has conditional type W given a;” G X"^ 
if 


N{a, b\x^, y") = N{a\x'^)W{b\a). ( 40 ) 

Then, we can dehne conditional type class of W given a;" G -T” by 

Tw{x'^) = {y” G y^ I y" has conditional type W given a;"}. ( 41 ) 

We can also bound a size of conditional type class. For sequence a;" G T”" with type P, and for conditional 
type W, we have 


1 


■ exp {nH{P\W)) < \Tw{x")\ < exp {nH{P\W)). 


( 42 ) 


(n+ 1 )^ 

H{P\W) denotes a conditional entropy of U given V where {U, V) are random variables with a joint law P x W. 


B. Proof of Theorem 0 

A key tool used in the proof is a rehned version of the type covering lemma OH. We say a set B is P-covering 
a set A if for all a G A, there exists an element b G B such that d{a, b) < D. In the successive rehnement setting, 
we need to cover a set in a successive manner. 

Definition 5 : Let di : A x B ^ [ 0 , oo) and d2 '■ Ax C ^ oo) be distortion measures. Consider sets A C A, 
B C B and Cb C C for all b G B. We say B and {Cb}beB successively {Di,D2)-cover a set A, if for all a G A, 
there exist b G B and c G Cb such that di{a, b) < Di and d2{a, c) < D2. 

The following lemma provides an upper bound of minimum size of sets that successively (Pi, P2)-cover a type 
class Tp. 

Lemma 8 (Refined Covering Lemma): For hxed n, let P G Pn(X) be a type on X where P(x) > 3 /n for all 
X G X. Suppose II VP(P, Pi, P2)|| is bounded in some neighborhood of (Pi,P2) where 

VP(P,Pi,P 2 )= (^^P(P,Pi,P 2 ),^P(P,Pl,P 2 )) . ( 43 ) 

Then for Pi,P2 G (OAm), there exist sets Pi C Xf and B2{xi) C Xif for each x" G Pi where Pi and 
{B 2 {x^)}x'^eBi successively (Pi, P2)-cover Tp with following properties; 
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• The size of Bi is upper bounded: 

-loglBil + (44) 

n n 

• For all Xi ^ Bi, the size of ^ 2 (^ 1 ) is also bounded: 

ilogd^il- 152(^1 )|) <R{P,D^,D2) + k2^^, (45) 

n n 

where ki and ^2 are universal constants, i.e., do not depend on the distribution P or n. 

The proof of Lemma is given in Appendix |B] The following corollary provides a successive refinement scheme 
using Bi and {i? 2 (a;i )}£"g-Bi from Lemma 0 

Corollary 9: For length of sequence n and type Q G Vni^), let R satisfy R > Ri{Q,D) + fcilogn/n. Then, 
there exists a coding scheme for Tq such that 

. Encoding functions are /g^ : Tg -)> {1,..., Mq^i} and /g ,2 : 7g -)> {1,..., Mg, 2 }- 

• Decoder 1 and Decoder 2 employ 

5 g,i Mg,(46) 

5 g ,2 :{l,...,Mg,i}x{l,...,Mg, 2 }^A' 2 " (47) 

respectively. 

• For all x" G 7g, encoding and decoding functions satisfy 

di{x^,gQ,i{fQ,ii^n))<Di (48) 

(^2 (a::"',gQ.2(/Q.i(a;”),/g.2(x"))) <£>2. (49) 

• The number of messages are bounded; 

R< — log Mg 1 <R + iniZ: (50) 

n n 

ilogMg,iMg .2 <i?(g,77i,f92) + (/c 2 + l)i^. (51) 

The proof of Corollary |9] is given in Appendix |C] 

Let us now describe the achievability scheme. Similar to the idea from 161, we will consider the four cases 
according to the type Q of the input sequence x”. For each case, the encoding will be done in a different manner. 
Before specifying four cases, we need to define Aiii and Ai? 2 . Let Ai?i be the infimal value such that the 
probability of {Ri{Pxn, Di) > Ri{P,Di) + Ai?i} is smaller than ei, and AR 2 be the infimal value such that 
the probability of {R{Px^, i9i, D 2 ) > R^P, Di,D 2 ) + AR 2 } is smaller than £ 2 - Recall that Px^ denotes the type 
of A". The error occurs at decoder 1 if and only if Ri{Px^, Di) > Ri{P,Di) + Ai?i, and therefore probability 
of error at decoder 1 is less than ci. Similarly, the error occurs at decoder 2 if and only if R{Px”-, Di, D 2 ) > 
R{P, 79i, £> 2 ) + Ai? 2 , and therefore probability of error at decoder 2 is less than £ 2 . The following lemma bounds 
Aiii and Ai? 2 - 






11 


Lemma 10: 


ARi = 


Ai?2 = 


A^0-.,«) + O 


logn 


A^A5j)<j-.fe) + o 


logn 


(52) 

(53) 


The proof follows directly from iflhl Lemma 3]. 

We are ready to define four cases based on the type of the source sequence as well as corresponding encoding 
schemes. 

1) Q e yi(o.o) A {Q g . R^{Q^Di) - Ri{P, D^) < ARi, R{Q, Di, D 2 ) - R{P,Di,D 2 ) < AR 2 }. 

In this case, both decoders decode successfully. Since R{Q,Di) < R{P,Di) + ARi, by Corollary |9] there 
exist encoding and decoding functions /q,i,/q, 2 j 5 q, 1 ) 5 q ,2 such that 


rfi(a;",5Q.i(/Q.i(^"))) <^i (54) 

d2 {x\gQ,2{fQA^njQA^n)) <D2 (55) 

for all €Tq and 

Ri{P, Di) + Ai?i + < - logM^^f <Ri{P, Di) + Ai?i + {h + 1)^ (56) 

n n n 

ilogM^°f <RiQ,D,,D2) + (fc2 + l)i^. (57) 

n n 

We emphasize that we have Ri{P,Di) instead of Ri{Q,Di) in (l56l l. This is because we need to aggregate 
the codewords at the end of the proof. More precisely, we have to fix the number of codewords for decoder 
1 in order to bound the number of codewords only for decoder 2. 

2) Q e ^(04) A IQ g : Ri{Q,Di) - i?i(P,i7i) < ARi, R{Q, D^, D2) - R{P, Di, D2) > AR2}. 

For those Q, the encoder only Di covers Tq. Thus, decoder 1 will decode successfully and decoder 2 will 
declare an error. In this case, we do not need a message for decoder 2 and we can think of = 1. For 

decoder 1, by Theorem[T] we can find encoding and decoding functions { 1 ^... and 

^( 0 , 1 ) ; |i^... ^ —!> Xi such that 

< D, (58) 

for all Q £ A(°4) and G Tq where 

- log= Ri{P,Di) + ARi + O . (59) 

n \ n J 

3 ) Q G ={QG Pn{X) : Ri{Q,Di)-Ri{P,Di) > ARi, R{Q, Di, D2) - R{P,Di,D 2 ) < AR2}. 

In this case, the encoder only D 2 covers Tq. Thus, decoder 2 will decode successfully and decoder 1 will 
declare an error. In this case, we do not need a message for decoder 1. However, because of the structure of 
successive refinement code, we need to reformulate the point-to-point code for the second decoder into the 
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form of successive refinement code. More precisely, we can find functions /q: A"” 
and : {1, • ■ •, —>■ such that 

d2ix\~gl^’°\fQ^’°\xn))<D, (60) 


for all x'^ G Tq where 


Let Mq’^^ and be 


-l0gM^^2°^ < R2iQ,D2) 

7n V) r? 


logn 


RiiP, Di) + ARi + < L logM^^f <Ri{P, Di) + Ai?i + (fci + 1) 


logn 


Q,i 

r(i’O) ,^(1,0) 


-logM^-^M^-^<-logM£2°^ + 


logn 


(61) 

(62) 

(63) 


For simplicity, we neglect the fact that the number of messages are integers since it will increase the rate 
by at most logn/n bits/symbol. Let h he a one to one mapping from {1,..., x {1. 

to Mg 2 °^}. Then, we can define encoding and decoding functions /g : <T” —>■ {1, 




Q.2 

MSfh 


& and } x {1,..., 

=h-^ 

5Q,2^(’7^l,n^2) =5Q’°^(ft.(mi,m2)). 


r(i.o) 


(LO) . 


r(L0)i 


r(i.o)-| 


Xo where 


(64) 

(65) 


Note that the first message is useless for decoder 1, but we do not care since it will declare an error anyway. 

On the other hand, decoder 2 will decode both mi and m 2 successfully. 

4) Q G g1(i4) a IQ g : RiiQ,Di) - Ri{P,Di) > ARi,R{Q,Di,D 2 ) - RiP,Di,D 2 ) > AR 2 }. 

The encoder sends nothing and the both decoder will declare errors. We can assume = 1. 

Finally, we merge all encoding functions together. Given source sequence x", the encoder describes a type of 
sequence as a part of the first message using |<T|log(n + 1) bits. This affects at most 0{\ogn/n) bits/symbol 
in rates. Based on the type of sequence, it employs an encoding function accordingly, as described above. Since 
the decoder also knows the type of the sequence, it can employ the corresponding decoding function. Since all 


M, 


(0,0) ^(0.1) ^,^(1,0) 


Q.i 


M, 


Q,i 


have the same upper bound, we can bound Mi: 

i logM, <R.(P. DO + Aif, + ((,, + 1)!2£I! + lA-l fAG 

n n n 


1) 


<Ri{P,Di) + 




n 


n 


Similarly, we can show that 


1 


- logMiM2 < R{P, Di,D2) + 
n 


LL£iAs)<3-.00 + o 


/logn 
V n 


( 66 ) 

(67) 

( 68 ) 


This concludes the proof. 
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C. Proof of Theorem \7\ 


Instead of type covering arguments that we used in the previous section, we use the result of sphere covering 
for Gaussian sources. 

Theorem 11 f420l/).' There is an absolute constant kg such that, if i? > 1 and n > 9, any n-dimensional spheres 
of radius R can be covered by less than spheres of radius 1. 

For simplicity, we refer to the sphere of radius r by r-ball and denote by = {i" : c?(a;",i”) < r^}, the 

set of points in the sphere centered at cc" with radius r. The above theorem immediately implies the following 
corollary. 

Corollary 12: For n > 9 and Ri > R 2 > 0, we can find a set C C MT of size M that satisfies: 

• For all x" G B{0,Ri), there is an element x'^ G C such that x" G yB(x",i? 2 ). 

• The size of the set M is upper bounded by 


1 , , ^ 1 , ^1 5 log n 

-logM < - log — + -- 

n 2 R 2 2 n 


O 


(69) 


Let ri and r 2 be radius of the balls such that Pr \Xl + • • • + > rf] = ei and Pr [X^ + • • • + X^ > r^] = £ 2 . 

First, consider the case ei < £ 2 . It is clear that > Q~^{e 2 ) and xi > r 2 . We can further divide this case 

into the following three cases. 


1) G B{0, r 2 ), i.e., Xf + • • • + X^ < r^. In this case, we design a code such that both decoders can decode 


successfully. 
Let C : 


=M| 

• ^(0,r2) C VnDi) 

• ilogMr’ < pog-S + +0 

which implies that there are number of VnDi-balls that covers the r 2 -ball. Upper bound on 

can be found similarly to the proof of Theorem |2] Since Q“^(£i) > Q~^{e 2 ), it is clear that 


be the set that satisfies: 

( 0 . 0 ) 


llogMf’°)<ilog-^ 

n I u\ 


+ \/ {^i) + o 


logn 


Similarly, we can cover a v^nDi-ball with number of v^nTJ^-balls. In other words, there 


(70) 


exists 


po.o) 


C K 
po.o) 


that satisfies: 


= XL 


(0.0) 


. 13(0, yJnDi) C U^ngcCo.o) .B(x", ^fnDf) 

• pogMf“><ilog4 + o(!2Jg!) 

where upper bound on is because of Corollary \T2\ 

Thus, if x” G S(0,r2), then we can find x” G such that x" G S(x”, \^nDi) which implies 

(1/n) ||x” — X 1 II 2 < Di- Furthermore, since x" — x" G B{0, y/nDi), we can find x" G such that 

x” — x” G S(x", y/nUf) which implies (1/n) ||x" — x" — x "||2 < 02- Finally, we can take X 2 = x/ + x", 
and we get (1/n) ||x" — X2 II2 — 792. 
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2) X" € 5(0, n) but X" i 5(0, ra), i.e., rl < Xl + ■ ■ ■ + Xl < rl. 

We will only send a message to decoder 1, and decoder 2 will declare an error. We can cover ri-ball with 
number of -balls where 


- log < - log — + W —g-'(ei) + O 


Therefore, there exists C^' that satisfies: 
• ^(0,r2) C 


i logMf'^) < ^ log ^ + ^g-^(ei) + O ( 1 ^) . 


We can think Mo 


to be one. 


3) X” i ^(0,n) and X" ^ S(0, ra), i.e., rl < X^ + ■ ■ ■ + Xl. 

The encoder does not send any messages and both decoder will declare an error. We can think both m[ 


and Mk 


to be one. 


Finally, we employ the codebookCi = and the same for Ca where |Ci| = Mi and jCa] = M 2 . 


-logMi <-log—+ W—g (ei)-fO 


Then, we can see that 


-logMiMa <-log— + W—g (ea) + 0 - . (73) 

n 2 D 2 V 2n \ n J 

Similarly, we can consider the case ei > 63 . In this case, it is clear that g“^(ei) < g~^(ea) and ri < ra. We 
can further divide the case into the following three cases, 

1) G 5(0, ri), i.e., Xl -f • • • -f X^ < rf. In this case, both decoders can decode successfully. 


number of Vnl?!-balls that covers ri-ball where 


We can find 


-logMr’'^^<-log—+ g—g-i(ei) + o ^ 

n 2 V 2n \ n 

Similar to previous cases, we can define to be a set of the ball centers. 

Also, we can cover y/nDi-b?)}! with number of v^ni^a-balls where 

ilogM('’°l<ilog^ + o(i^). 

n 2 D 2 \ n J 

Since Q“^(ei) < Q“^(e 2 ), it is clear that 


< llog-^ + jXg-i(e2) + 0 . (76) 

n 2 D 2 V 2n \ n J 

2) X” G 5(0, ra) but AT” ^ ^(0, n), i.e., rl < Xl + ■ ■ ■ + Xl < r^. 

We will only send a message to decoder 2, and decoder 1 will declare an error. We can cover ra-ball with 

(IT) number of v^niTa-balls where 


1 logM(i.i) < i log ^ y^ (^) ■ 
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Similar to the proof of Theorem|2 we can split the message S into S 

X such that 


Aff 

-logMf’^^ < ^log-^ 

n 2 Di 


1 


■ log M, 


( 1 . 1 ) 


1 


Di 




O 


\e2) + 0 


logn 


logn 


(78) 

(79) 

(80) 


Recall that the decoder 1 does not care about the reconstruction of the source, where, on the other hand, 
decoder 2 will get both and and will be able to reconstruct the source based on 

3) X"^S(0,ri) and X" ^ S(0,r2), i.e., ri < Xf + ■ ■ ■ + 

( 12 ) ( 12 ) 

We will not send any messages and both decoder will declare an error. We can think both ’ and ’ 
to be one. 


Similar to the case of ei < 62 , we can combine the codebooks and get 


1 , ^ 1 

-logMl <-log — + 

n 2 Di 

1 1 

- log M 1 M 2 <- log — + 
n 2 D 2 


'(ei) + 0 

^0-'(e2) + O 


logn 

n 

logn 


( 81 ) 

(82) 


This concludes the proof. 

Remark 2: If we have ei = €2 = e, radius ri and T 2 are the same and the proof can be simplified. In this case, an 

error will occur at both decoders if and only if Xf + • • • +X^ > where r = ri = r 2 . Since both decoders share the 

same error events, encoding can be done successively in a simple manner and we do not have to consider the case 

of message splitting. More precisely, given codebook {(X"(i), X^(j)) : 1 < i < Mi, 1 < j < M 2 }, the encoder 

2 2 

finds i such that (1/n) X" — X?(i) < Di and then finds j such that (1/n) X'^ — X}'(i) — X^U) < 792- 

2 2 

This is the key idea of Section |Vl] where we use the successive refinement technique to construct a point-to-point 
source coding scheme with low complexity. 


VI. Layered Codes 

We considered the successive refinement problem with two decoders so far. In this section, we show that the 
idea of successive refinement is also useful for point to point lossy compression where we have one encoder 
and one decoder. The intuition is that successive refinement coding provides a tree structure for a coding scheme 
which allows low encoding complexity. More precisely, if the source is successively refinable, we can add L — 1 
virtual mid-stage decoders and employ a successive refinement scheme for L decoders without any (asymptotic) 
performance loss. For fixed L, this is a simple extension of successive refinement, however, we also provide a result 
for L = Ln growing with n. Since the number of decoders L corresponds to the level of tree and larger L leads 
to lower complexity of the scheme, we have a great advantage in terms of complexity by taking growing L = Ln- 
Note that the tree structured vector quantization (TVSQ) has been extensively studied, and also has a successive 
approximation property. For example, in ll2Tl . Effros et al. combined pruned TVSQ with a universal noiseless coder 
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which enables progressive transmission of sources. While this approach guarantees optimality at zero distortion, it 
cannot achieve the rate-distortion function in general. 

The precise problem description is the following. Let n be the block length of the coding scheme. The codebook 
consists of L sub-codebooks (c[^\C 2 ^\ ■ ■ ■ ,C^^) and each sub-codebook consists of Mi codewords for 1 < i < L. 
We consider the following encoding scheme which we call layered coding: 

• Find Cl € that minimizes some function tpi(x",ci). 

• For i > 2, given ci, • • • , Ci_i, find Ci G that minimizes 'tpi{x^ , ci, C 2 , • • • , Ci_i), 

where ipi, ...jipL are simple functions that depend on the specific implementation of the scheme. One can think 
of (ci,..., Ci) as messages for an i-th (virtual) decoder. The compressed representation of the source consists of 
a length L vector (mi, - •• ,mi) which indicates the index of codeword from each sub-codebook. Note that the 
total number of codewords is Mi x • • • x Ml and the rate of the scheme is i? = X^i^i « Once the decoder 

receives the message, it reconstructs X" = (p^rrii, • • • , m^) with some function (j). 

Definition 6: An (n, L, {Mi, • • • , Ml}, D, e)-layered code is a coding scheme with L sub-codebooks where the 
size of the i-th sub-codebook is Mi, and the probability of excess distortion Pr d(X",X") > D is at most e. 

Note that the definition of the layered code is exactly equal to that of the successive refinement code except the 
fact that the layered coding scheme only considers the distortion at the last decoder. 


A. Layered Coding Schemes 

We show the existence of layered coding schemes for a Gaussian source under quadratic distortion and for a 
binary source under Hamming distortion. For fixed L, it is easy to have a layered coding scheme, since sources 
are successively refinable in both cases and we can apply the successive refinement schemes. In this section, we 
generalize the result even further in two aspects. First, we consider how fast the coding rate can converge to the 
rate-distortion function, and provide an achievable rate including a dispersion term. Then, we allow L to be a 
function of block length n, and provide a layered coding scheme for L = Ln growing with n. Our next theorem 
shows an existence of a rate-distortion achieving layered coding scheme for given n and L. 

Theorem 13: For i.i.d. Gaussian sources under quadratic distortion and i.i.d. binary sources under Hamming 
distortion, there exists a (n, L, {Mi,..., Ml}, D, e)-layered code such that 

f i logM. < RID) + + Lk'-tSti + O (83) 

i—l ^ ' 

for some constant k where the O (logn/n) term does not depend on D or L. 

The proof and discussion of Theorem fOl are given in Section IVl- A1 1 and Section rVl-A2l Note that LA: logn/n is 
also in the class of 0(logn/n) for constant L, however, we will also consider the case where L = Ln grows with 
n. We would like to point out that the last 0(logn/n) remains the same even when L = L„ increases as n grows. 

1) Gaussian source under quadratic distortion: For Gaussian source under quadratic distortion, we can generalize 
Theorem |2] to the case of multiple decoders. As we mentioned in Remark |2l we choose all to be equal to e. 












17 


Lemma 14: Let a source be i.i.d. Gaussian under quadratic distortion. For all L, there exists a 

(n, L, {Ml, • • • , Mi}, Z?, e)-layered code such that 

-logMi <ilog-^ + +0(84) 

n 2 Di \ 2n \ n J 

— log Mi log + 3 ^°^^ for 2 < i < L (85) 

n 2 Di n 

for any Di > D 2 > ■ ■ ■ > Dl = D where the O (logn/n) term depends on e but not on L or the Di values. 

The choice of -i/ji and (p will be specihed in the proof. The fact that the O (logn/n) term is not dependent on 
the specific choice of iZi’s and L is important in cases we consider later where L and Di vary with n. 

Proof: Consider the successive rehnement problem with target distortions Di > ■ ■ ■ > D^ — D and target 
excess distortion probabilities ei = • • • = cl = e. Given sub-codebooks Ci,... ,Cl, the basic idea of the scheme is 
as as shown in Algorithm[T] Note that the input of the algorithm is a given sequence a;" and the set of sub-codebooks 
Cl,... ,Cl where the output is the collection of sub-codewords Cmi, ■ ■ ■, CmL ■ 


Algorithm 1 Encoding Scheme. 

Set Di > D2 > ■ ■ ■ > Dl = D, and let = x”. 
for 7 = 1 to L do 

Find a codeword Crm & Ci such that ||x(* - CjniWl - 

If there is no such codeword, declare an error. 

Let — Cmi- 

end for 


We construct sub-codebooks based on Corollary [TJ] Let r be a radius such that Pr [X/ -f • • • -f > r^] = e. 
Similar to the proof of Theorem |7] we can hnd Mi number of y/nDi-ha\ls that covers the r-ball where 


1 1 

-logMl <-log — 
n 2 Dl 


+ \I^Q W + o 


logn 


( 86 ) 


Again, the term O (logn/n) only depends on e where we provide the details in Appendix iDl Then, for i > 2, we 
can cover Y^nDi_i-ball with Mi number of y/nDi-baWs where 

-logM, <-log— - h3 -. (87) 

n 2 Di n 


The 7-th sub-codebook Ci is a set of centers of v^niZi-balls, and therefore |Ci| = Mi. 

Suppose the encoder found Cmi, ■ • • , successfully, which implies ||cmi H-+ Cmi_i — — nDi-i. 

In other words, x" is in the ball with radius y^riDiZi where the center of the ball is at Cmi -f • • • -f Then, 

by construction, we can always hnd Crm C Ci such that ,Cmi, ■ ■ ■ ,Cmi) = ||cmi -f • • • -b Cmt “ a :"||2 < nDi. 

We can repeat the same procedure L times and hnd {mi, m2, ... ,mL). 

The error occurs if and only if the event H-h Al^ > happens at the beginning, and therefore the excess 


distortion probability is e. The reconstruction at the decoder will be (j>{cmi, ■ ■ ■ ,CmL) = Cmi -b • • • -b CmL- 
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The overall rate of Lemma [14] can be bounded by 


L I 

V - log Mi 

n 




( 88 ) 


1 (7^ 

^ 2'°®d7 + 

1 cr^ 

= 2‘“®D + 


Q-\e) + 0 


/logn 
V n 


+E 


i=2 


1 A-1 , 


2‘“® C 


+ 3- 


Q-i(e)+3(L-l)i^ + 0 

2n n 


logn 


(89) 


(90) 


2) Binary source under Hamming distortion: The next lemma provides a similar result for a binary source under 
Hamming distortion. 

Lemma 15: Let the source be i.i.d. Bern(p) and the distortion be measured by Hamming distortion function, 
where the target distortion is D. For large enough n, there is a (n, L, {Mi, • • • , M^}, D, e)-layered code for all L 
and Di > D 2 > ■ ■ ■ > Dl = D such that 


- log Ml <h 2 ip) - h 2 {Di) + \ ^(e) + O 

n \ n \ n J 

1 loff 71 

- logMi </i 2 (A-i) - ^ 2 (A) + ^3 - , for 2 < i < L 

n n 

where O (logn/n) only depends on e, we denote dispersion of Bern(p) source with V{p, D) = p(l — p) log^((l — 

p)/p), and a binary entropy function with /i 2 (p) = —plogp— {1— p) log(l — p) and is a constant that does not 

depend on any of the variables. 

Proof: Similar to the proof of Lemma [T4| we can consider the successive refinement problem with target 
distortions Di > ■ ■ ■ > = D and target excess distortion probabilities ei = ••• = e/, = e. The basic idea 

of coding is very similar to the Gaussian case. The difference is that we use Hamming instead of I 2 balls, and 
therefore we need Lemma [8] instead of Corollary [T2| A Hamming ball with radius r is defined by 

n 

Bnir) = {y" G {0,1}" : ^ y* < r}. (93) 

i=l 

Given sub-codebooks Ci,... ,Cl, the basic idea of the achievability scheme is the following: 


(91) 

(92) 


Algorithm 2 Enoding Scheme. 

Set Di > D 2 > ■ ■ ■ > Dl = D, and let = cc". 
for i = 1 to L do 

Find the codeword Crm G Ci such that Crm) < Di- 

If there is no such codeword, declare an error. 

Let x^*) = © Cmi- 

end for 


Similar to Algorithm]!] the input of the algorithm is a given sequence a:" and the set of sub-codebooks Ci,... ,Cl 
where the output is the collection of sub-codewords Cmi, ■ ■ ■, Cmc, ■ 
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In the first stage, similar to M Theorem 1], we can hnd a sub-codebook Ci with size Mi such that the excess 
distortion probability is smaller than e and 

-logMl < h{p) - h{Di) + ^ ( 94 ^ 

n \ n \ n J 

Similar to the Gaussian case, the term O (logn/n) only depends on e, where the detail is provided in Appendix |E] 

For z > 2 and the given type Q, Lemma [8] implies that there is Mq^i Hamming balls with radius nDi that covers 

all sequences of type Q where 

- log M Q <R{Q,Di) + (95) 

n n 

=h(Q(l))-/i(A) + fci^. (96) 

n 

Let Cq 4 be a set of centers of Hamming balls with radius nDi, and therefore [Cg^l = Mq^i. The z-th sub-codebook 
Ci is union of Cq^’s for all type Q G T{Di-i, Di) = {Q G VniX) ■ Di < Q{1) < Di-i} and zero codeword 
(0,0,0,-•• ,0), i.e.. 


Then, we have 


C, = {(0,...,0)}U U Cq,,. 


-logMi = ilog \C^\ 
n n 




< — log ( 1 -1- (n -I- 1) max Mq i 

n \ QeT{Di.r,Di) 


< h(Di-i) — h[Di) -|- (fci -|- 1) 


logn 


n 


(97) 

(98) 

(99) 

( 100 ) 

( 101 ) 


where (llOOI l is because |T(-Di_i, Di)\ < nDi_i — nDi + 1. We can set k^=ki + 1. 

Suppose the encoder could hnd Cmi, • • ■ , Cmi_i successfully which implies d{Cmi © • • • © , x"^) < nDi_i. 

In other words, a;" is in the Hamming ball with radius nDi-i where the center of ball is at Cmi © • • • © 

Then, by construction, we can always hnd Cmt G Ci such that ijji{cmi, ■ ■ ■, Cmt ) = d{cmi © • • • © Cmt , tr”) < nDi. 
We can repeat the same procedure L times and hnd {mi, m 2 ,..., ttzl). 

The error occurs if and only if the hrst sub-codebook fails to cover the source a:" at the beginning, and therefore 
the excess distortion probability is e. The reconstruction at the decoder will be (j){cmi, ■ ■ ■, CmL) = Cmi © • • • © Cm,L ■ 


Remark 3: We would like to point out that Lemma [15] is limited to memoryless binary sources while Theorem 
[5] holds for any discrete memoryless sources. The main difference is the operation between source symbols. More 
precisely, in Lemma [T5| the source is encoded and then the “error” sequence (modulo 2 difference) is encoded 
again. Note that Hamming distortion is closely related to this operation. However, It is hard to generalize this idea 
to non-binary sources because there are no corresponding differences when the distortion measure is arbitrary. The 
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modulo \X\ difference could work, but it is complex to analyse even when the distortion measure is still Hamming 
distortion. 

The overall rate of Lemma [15] can be bounded by 


^ - logM, <h 2 {p) - h 2 {D,) + ^/ZEMQ-I(e) + o 

n \ n \ Ti 


L r 


E 


h2{Di-i) — h2{Di) + fca 


logn 


=h2{p) - h2{D) 




o 


logn 


n 


( 102 ) 

(103) 


B. Discussion 

1) Rate-Distortion Trade-Off: In both (l90l l and (11031) . it is obvious that the choice of L has an important role. 
For simplicity, we only consider the case where Mi = M 2 = ■ ■ ■ = Ml = M, and we neglect the fact that the 
number of messages M is an integer. We can find M and Di > D 2 > • • • > Dl which satisfy (l84l i and dSST l (or 
d^T l and (i92T l) with equality. For example, in the Gaussian case, we can find M and Di, - ■ ■ , Dl sequentially: 


-logM =ilog-^ + ^(e)+0 

n I L>i " 


2n 
logn 


logn 


- log M log + 3 
n 2 L>i n 


for 2 < i < L, 


(104) 

(105) 


Clearly, the number of possible reconstructions is M^ = and the rate of the scheme is R = (l/n)LlogM. 
On the other hand, the complexity is of order M x L since the encoder is searching a right codeword over M 
sub-codewords at each stage. Thus, for fixed rate R, we can say that the coding complexity (or size of codebooks) 
scales with L exp (nR/L) which is a decreasing function of L. This shows that larger L provides a lower complexity 
of the scheme. It is worth emphasizing that we can set L = to be increasing with n. This is because the bounds 
in both corollaries hold uniformly for all L. 

On the other hand, in both corollaries, the overall rate can be bounded by 

logn 


R{D) + 


y^Q-\e) + kff 


O 


logn 


(106) 


for some constant k, where we denote by R{D) and V{D) the rate-distortion function and the source dispersion. 
However, the optimum rate is given by 

' log n' 


R{D) + 


EEQ-i(e) + o 


(107) 


We can see that there is a penalty term kLniiogn/n) because of using layered coding. If is growing too fast 
with n in order to achieve low-complexity of the scheme, then the rate penalty term L„(logn/n) can be too large 
and we may lose (second-order) rate optimality. This shows the trade-off between the rate and complexity of the 
scheme. Consider the following two examples, which are valid for both the Gaussian and binary cases. 

• If = L is constant, then the scheme achieves the rate-distortion and the dispersion as well, but the complexity 
is exponential (albeit with a smaller exponent). 
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• If L„(logn/n) —^ 0 as n —>■ oo, we can achieve the rate-distortion function. For example, if L„ = n/log^ n+1, 
then the achieved rate is 

R = R{D) + o{-^] , (108) 

Vlogny 

i.e., the scheme achieves the rate-distortion function as n increases, while the coding complexity is of order 
jsjQfg j-jjg excess distortion probability e is hxed. We would like to point out that the rate is 
near polynomial in n. 

• If Ln{\ogn/^/n) —>■ 0 as n —>■ oo, we can achieve the source dispersion. For example, if = y/n/log"^ n-f 1, 
then the achieved rate is 

R =R{D) + + O ( ^ -y (109) 

V n \ log n J 

Note that R — R{D) is inversely proportional to i/n with coefficient in other words, lay¬ 

ered coding can achieve the second order optimum rate. On the other hand, coding complexity is of order 
(i/n/log^ which is better than the original exponential complexity. 

2 ) Generalized Successive Refinability: We would like to emphasize another interesting feature of layered coding. 
Layered coding can be viewed as a successive rehnement scheme with L decoders. Since our result allows L = L„ 
to be increasing with n, this can be viewed as another generalized version of successive rehnement. If the source is 
either binary or Gaussian and lim„_>oo Ln{logn/n) = 0, the source is successively rehnable with inhnitely many 
decoders, where the rate increment is negligible. For comparison, in the classical successive rehnement result, the 
number of decoders is not increasing and the rate increment between neighboring decoders is strictly positive. In 
II3, this property is termed infinitesimal successive refinability, and the results here establish that Gaussian and 
binary sources are inhnitesimally successively rehnable sources (under the relevant distortion criteria). Moreover, 
if we further assume lim„_>oo Ln = 0, each decoder can achieve the optimum distortion including dispersion 
term. In this case, we can say that the binary and Gaussian sources are strongly inhnitesimally successively rehnable 
sources. 

In 03, the authors also pointed out that inhnitesimal successive rehnability yields another interesting property 
called ratelessness. Consider a binary or Gaussian source with lim„_>.oo = 0, where the decoder received the 

hrst few fraction of messages, i.e., (mi, m 2 ,..., m^i) for some 0 < a < 1. Based on the proof of Lemma fT4l and 
Lemma [ 14 ] the decoder will still be able to reconstruct the source sequence with distortion D{aR) which is the 
minimum achievable distortion at rate aR. If we have lim„_>.oo Ln = 0, an even stronger ratelessness property 
can be established. In this case, the decoder can achieve the optimum distortion including dispersion terms. 

VII. Conclusions 

We have considered the problem of successive rehnement with a focus on the optimal rate including the second 
order dispersion term. We have proposed the concept of “strong successive rehnability” of the source and obtained a 
sufficient condition for it. In particular, any discrete memoryless source under Hamming distortion, or the Gaussian 
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source under quadratic distortion are strongly successively refinable. We also show that the complexity of point-to- 
point source coding can be reduced using the idea of successive refinement. For binary and Gaussian sources, we 
characterize an achievable trade-off between rate and complexity of the scheme. We establish, for these cases, the 
existence of schemes which are infinitesimally successively refinable, rateless, achieve optimum dispersion, with 
sub-exponential complexity. Alternatively, essentially polynomial complexity is attainable if one is willing to back 
off from attaining the dispersion term. 


Appendix 


A. Derivative of Rate-Distortion Function 

For hxed D > 0, the rate-distortion function is a mapping between Cm to R where Cm = {(tci,..., Xm) ■ Xi > 
= 1} C R™. Note that the tangent space of Cm is (to — 1)-dimensional hyperplane that contains 
Cm itself. We say i?(-, D) is differentiable at P = (pi,... ,Pm) G if there is an extension P(', D) : R™—i-R 
which is differentiable at P. The derivative of R{-,D) is defined by a derivative of its extension, i.e.. 


y dpi dp2 dpm J 

Since Cm is smooth, the derivative P'(P, D) is well-defined in the following sense Il22l 4p]. Let Pi(', D) : 
be another extension of R{-,D), then for any Q S Cm, we have 

(p;(p, D),Q-P) = (P'(P, D),Q-P). 


( 110 ) 


( 111 ) 


This implies that the derivative along its tangent plane is the same regardless of the choice of extension. This is 
enough to use Taylor series since 

P(Q, D) = R{P, D) P {R!{P, D),Q-P) + high order terms. (112) 

Now, consider the well-definedness of V{P,D). For an extension R{-,D) : R’”—>^R, the source dispersion is 
defined by 


L(P,P)=VARfp;(P,P)j 




dRi{P,D) 

dpi 


^dRi{P,D) 

P* “ I -P* 


dpi 


Suppose Ri{-,D) is another extension of R{-,D), then (11111 1 implies that 

R[iP,D)=R'{P,D)+alm 

for some a G R where !„ = (1,1,..., 1)"^ G R™. Then, we have 


VAR 


2 = 1 

m 

=E 


dRi{P,D) 

dpi 

dRiiP,D) 

dpi 


\ ^dRi{P,D) 




dp. 


(113) 


(114) 


(115) 


(116) 

















23 


V- / dR(P, D) ^ dkiP, D) 

= I - Pi I 


=VAR 


R'{P,D) 


i=i 


dpj 


(117) 

(118) 


Therefore, VAR D)] does not depend on the particular choice of extension. 

The same argument holds for R'{P, Di, D 2 ) and V{P, Di, D 2 ) as well. More precisely, for any Q G Cm and 
extensions R{P, Di, D 2 ) and Ri{P, Di, D 2 ), we have 


{R'{P,Di,D 2),Q-P) = {R[{P,Di,D2),Q-P). 
Also, VAR[R' {P, Di, D 2 )] does not depend on the particular choice of extension. 


(119) 


B. Proof of Refined Covering Lemma for Successive Refinement 

The proof is similar to the proof of ||5] Lemma 1], however, we have to consider vanishing terms more carefully in 
order to deal with source dispersions. Given type class 7p, we want to construct sets Bi C Xf and B 2 {xi) C X 2 
for all x” G Bi such that 

TpC \J (120) 

B{x’l,Di)c U B 2 {x^,D 2 ) forallx^GRi, (121) 

x^£B 2 {x^) 

where Bi{x2,D) = {x" G X" : di(x",x”) < D} for i G {1,2}. We construct such sets using conditional types. 
Let 


B* ^Bi - - |A| 
n 

D 2 =D2 - - IVI 

n 


Vi 


Vi 


dm 


1 ^X 2 A 1 ) dm- 


Then, there exist probability kernels IVi : X ^ Xi and W 2 : X x Xi —)■ X 2 such that 

/(A;Ai) =Ri(P,B*) 

I(X;Xi,X2) =R(P,D*,D^) 
where the joint law of (A, Ai, A 2 ) is P x Wi x W 2 and 
E 


E 


di(A, Ai) = ^ P(x)lVi(xi|x)(ii(x,Xi) < D} 

X,Xi 

d2(A, A 2 ) = ^ P(x)lVi(xi|x)lV2(x2|x,Xi)(i2(a:,a;2) <-D2- 


( 122 ) 

(123) 

(124) 

(125) 

(126) 
(127) 


The structure of kernels are described in Figure |2] 

Let [IVi] and [W 2 ] be rounded versions of IVi and W 2 so that n[VLi](xi|x)P(x) and n[kF 2 ](x 2 |a;,*i)[lVi](xi|x) 
P(x) are integers for all x,Xi,X 2 - Clearly, for all x,Xi,X 2 , 

1 


nP{x) 


|[lVi](xi|x) - lVi(xi|x)| < 


(128) 
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: X 




TX2 : X X ^ X2 




Fig. 2. Structure of Kernels 


- W,{i,\x.n)\ (12!') 

Let T[Wi]{x"') be the conditional type class of [VLi] given a;", and 7[W2]{^"'j ^ 1 ) be the conditional type class of 
[IL 2 ] given (a:",a;”). Then, following lemma shows that a;",a;" and from those type classes satisfy distortion 
constraints. 

Lemma 16: For any a;” G Tp, G 7[Wi](tc") ™tl G T[W 2 ]{^^^^i)’ we have 

di{x^,x^)<Di (130) 

d2ix”,x^)<D2. (131) 


The proof of Lemma [16] is given in Appendix |F| 

To construct the codebook, we further let \Q\ be a marginalized type of Xi and [V 2 ] be a marginalized kernel 
from Xi to .T 2 . More precisely. 


[Q](^i) = [Wi]{xi\x)P{x) 


[V2]{X2\Xl) 


1 

[Q](*i) 


[W2](x2|a;,xi)[Wi](ii|a;)P(a;). 


(132) 

(133) 


We further let Gi = T[q], Gi(a:”) = T[Wi]{x"‘), ^ 2 (^ 1 ) = 7 [v 2 ](^i) 62 ( 3 ;”,i”) = ( 2 ^"; ^1 ) for all 

x" G Tp, Xi G Gi- It is clear that Gi(x"') C Gi and G 2 {x^,Xi) C G 2 (x"). We generate codebook randomly 
based on these sets. 

LetZ^ = (Zi,.-- ,Zm) be a randomly generated codebook where Zi,... , Zm G are i.i.d. random variables 
that has uniform distribution over Gi. Also, for given Zi = Zi, let = (S^ 1 , • • • ,2^ ^v) C be i.i.d. random 

variables uniformly distributed over G 2 {zi). The size of codebook M and N will be specified later. We denote 
Ui{Z^) the set of source words that are not covered by the codebook Z^, i.e.. 


Ui{Z^) ={x” G Tp : di(x”, Z,) > Di, for all 1 < i < M}. 


(134) 


Also, for each 1 < i < M, let U 2 {^^) be the set of source words that are covered by Zi but not covered by the 
codebook E.^, i.e.. 


W 2 ( 5 f) ={x’^ G Tp : < Pi,d 2 (x",S,,,)) > D 2 , for all 1 < j < N}. 


(135) 
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If we can show that E [|W(Z™) U (yifLiU 2 {^f ))\\ < 1, then we can say that there exist sets Bi and i?2(5i) 
that satisfy (I1201 i and (I1211 i. This is because the random variable only gets integer values, and the fact that its 
expectation is less than one implies that there exists an event of the variable being equal to zero with non-zero 
probability, as required. We will show that the expectation can be made to be less than one, by taking M and N 
to be large enough, but not too large so that (l44l i and (l45T l are satished. Note that this argument is similar to that 
of lUll Chapter 9], 

We begin with union bound. 


E[|Wi(Z^)U(U™iW2(Sf))|] = ^ Pr 

x'^^Tp 

M 

< ^ Pr[a:"eWi(Z^)] + ^ ^ Pr [x" e )] ■ 

x'^^Tp x^^Tp 

We can bound the hrst term using type counting lemma. 



(136) 

(137) 


^ Pr[x" GWi(Z’”)] = ^ (l-Pr[di(x",Zi)<79i])'^ (138) 

x'^^Tp x^^Tp 


^ E 

x^^Tp 



Gi(x") 


M 


(139) 


< exp 

x'^^Tp 



< Y exp(-(n + l)-l'^H'*i|exp(n(i7([Xi]|X)-iT([li])))M) 

x^^Tp 


= \Tp \exp (-(n + exp{n{H{[X,]\X) - Hi[X,])))M^ 

< exp{nH{P)) exp (n + exp(—nJ(2f; [Xi]))M^ 

where the joint law of (X, [Xi], [X 2 ]) is P x [Wi] x [W 2 ]- Note that (I1411 i is because of ( |39] ) and (|42]) . while 
is due to dJTl) . 

We can bound the second term using a similar technique. 


(140) 

(141) 

(142) 

(143) 

dm 


Pr [x" e G2(5f)] 

= Pr[di(x",Z0 <Pi,d2(x",5,.,) >P2,Vj] 


E E Pr[di(x”,x”) < i9i,d2(x”,.=jj) > L> 2 ,Vj I Zi = x”] 


|Gi| . 


x^ GGi 


|G 


^ ^ Pr [d 2 (x", >D 2 \Z, = x^] 

G2ix^,x^) 


x^€Gi 

di{x^ ,x^)‘<Di 


— E 

di{x'^ ,x'^)<Di 


exp —N 


\G 2 im 


(144) 

(145) 

(146) 


(147) 
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1 


E 


d\{x'^ ,x'^)<D\ 


exp 


(-iV(n+l) 


-IX\- Xi ■ X2 


exp{-n{Hi[X2]\[Xi]) - H{[X2]\X, [Ij 


(148) 


Finally, we get 


M 


E E^(*"ew2(sf)) 

x'^eTp i=l 

< M ITpI exp (-lV(n + i)-\x\-\Xi\-\x 2 \exp{-n{H{[X 2 ]\[Xi]) - H{[X 2 ]\X, [li 

< M\rp \exp (-lV(n + exp(-n/(X; [lalll^i]))) • 

We choose M and N that satisfy 

(n + i)I'^H'^i|+ 2 exp(n/(X; [li])) < M < (n + l)l'^H-^i|+4 exp(n/(X; 


(149) 

(150) 

(151) 


(n + l)l'^H'^iM'^=l+2exp(n/(X;[X2]|[Xi])) < iV< (n + l)l'^H'^iM'^^l+4exp(n/(X;[X2]|[Xi])). (152) 

If we apply such M and N to ( I1371 i. ( I1431 l and ( 11501 ). it automatically gives E \\U{Z'^) U {y)fLiU 2 {^f)) I] < 1 


for n>\X\- 

inD where 




4 + H{P) + J(2f; [Xi]). Therefore, there exists sets Bi and B 2 {xi ) that satisfies (11201) and 

2.|A’| 


1 


■\og\B,\<I{X;[X,]) + 


\og{\B,\-\B2ii-,)\)<I{X;[X,],[X2]) 




■ logn 


2-1^1- 



X 2 

+ 2.|A’|. 


+ 16 


logn 


(153) 

(154) 


n n 

for all Xi G Bi. Note that we bound log(n + 1) by 2logn. 

Then, the following lemma bounds the gap between I{X;Xi) and I{X][Xi]) (also for I(X;Xi,X 2 ) and 
I{X; [Xi], [^2])) where the proof is given in Appendix iGl 
Lemma 17: 




2 1^1- 

A”! 




1(X;X,)-I(X;IX,]) 

< 

n 


logn 



41^1- 

^1 


^”2 

I(X;X,,X2)-I(X;[Xi],lX2]) 

< 

n 




logn. 


With (11531 ) and (11541 ). we can bound the size of Bi and 52(^1 )’s by 

A-i +8 


1 . 4.|A’|. 

-logiBil <I{X-X,) + - 

n 


^\ogi\B,\ - \B2{xm <IiX;X,,X2) + 


■ logn 


6-|A|. 



^2 

+ 2.|A’|. 


+ 16 


logn 


(155) 

(156) 

(157) 

(158) 


Recall that we set Xi that satisfies I{X;Xi) = R{P, D'^). Thus, the final step of the proof should be bounding 
the difference between R{P,Di) and R{P^D\), and also between R{P, Di, D 2 ) and R{P, D^, D^). 

Lemma 18: For large enough n, we have 

logn 


Ri{P,Dl) <Ri{P,Di) + 


(159) 
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R{P,DID*) <R(P,Di,D2) + 


log 77 


The proof is given in Appendix |H] 
Finally, we have 

1 


log|Bi| <RiiP,Di) + {4-\X\ 




9 ) 


log(|Bi| • \B 2 ix^)\) <R{P,D,,D 2 ) + (6 • IT”! 
We can see that the coefficients of the \ogn/n terms are 




logn 


A”, 




fci =4.|A’|- 
k2 =Q-\X\ ■ 


A”, 


2.|A’|. 


A”! 


17 




17) 


logn 


which are independent of the distribution P and block length n. This concludes the proof of the lemma. 


(160) 


(161) 

(162) 


(163) 

(164) 


C. Proof of Corollary |9] 

By Lemma[8] there exist Bi, {B 2 {xi)}x’^^Bi that successively (Hi,Il 2 )-cover 7q where 

1 loe! Ti 

-log\Bi\ <Ri{Q,Di) + ki - (165) 

n n 

1 loff Tl 

-log(|Hi| • \B 2 {x^)\) <R{Q,Di,D 2 ) + k 2 ^- for all x^ G Hi. (166) 

n n 

For simplicity, we neglect the fact that the number of messages and the size of sets are integers. Let Mqy = e"^ 
and let Mq ^2 that satisfies Mq ^Mq 2 = |Hi| • \B 2 (Pi)\- Then, (fSOl l and (ISTT l hold by definition. Then, 

we can find an one to one function 


h-. y ({i?}xH2(x(‘))^{l,...,MQ,i}x{l,...,MQ,2} (167) 

such that x" can be uniquely recovered based only on mi where (mi, m 2 ) = /ii(xi, X 2 )> t.e., there exists a function 
h such that x" = hirrii). This is because |i?i| < Mq i. 

For all x^ G Tq, there exists x" G Bi and X 2 G i? 2 (xi) such that (ii(x",Xi) < Di and d 2 (x",X 2 ) < H 2 . 
Let /q,i(x") and fQ^2{x^) be the first argument and the second argument of /i(xi,X2), respectively. Further let 
5Q.i(wi) = h{mi) and 5Q,2(mi,m2) be an inverse function of h{-, •). By construction of Bi and {B 2 {xi)}x"e b^, 
encoder and decoder satisfies dull and (Hill. 

Note that Mg i has to be an integer, and may not be exactly equal to e”^. However, we can set (1/n) logMg i 
to be close to R, i.e.. 




n 


( 168 ) 
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D. Bound O term for Gaussian case 

Theorem 19 (Berry-Esseen Theorem 4231/ ); Let Z'^ be i.i.d. random variables with E [Zi] = 0, E \Z^]^ — and 
E \Zif = p < oo. Let Fn be the cumulative distribution function of (X]r=i $ be the cumulative 

distribution function of the standard normal distribution. Then, for all n, 

Cp 


sup \Fnix) - $(x)| < 




(169) 


In ll24l . Shevtsova showed the optimum C is smaller than i. 

Let X" be i.i.d. Gaussian random variables with zero mean and variance cr^. Then, for > na^, we have 


Pr [Er=i =Pr 
<Q 


y/2na 


> 


y/2n( 


na^ 


\/2na'- 

1 15cr® 

2 2\/2ncj^ 


where we want this probability to be smaller than e. Thus, we can set r such that 


2 2 
r = na 


e — 




By Corollary [12] we can cover r-ball with Mi number of v^nUj^-balls where 

5 log n 1 


-logMi <ilog- ^ , 

n Z nDi z n 


■ log ks 


1 , ' 

"jlog- 


+ ^ 4^) 51og7i 


nDi 


2 n 


■ log fcs 


1, 1, r 


Q 


-1 


15 \ I 5 logn ^ 1 


n \ 4\/^/ I ' n 


log ks 


15 


A 5^ 


n 1 , , 

- + -logfcs- 

z n n 


Using Taylor’s expansion, one can bound Q ^ (e — 15/(4-\/2n)) by Q ^(e) + O {1/^Jn). Finally, we have 


-logMi <ilog-^ + 1(e) + 

n 2 Di i/2n 2 n 


5 logn , Q 1 


(170) 

(171) 

(172) 

(173) 

(174) 

(175) 

(176) 

(177) 


where O (1/n) term does not depend on L or Di. 


E. Bound O (logn/n) term for binary case 

Let be i.i.d. Bernoulli(p) where p < 1/2. Then, for 1/2 > g > p, we have 

Pi-E”=i762>g] =Pr 




<Q (q-p) 


n 


1 


P 


p{l — p) J 2 — p) 3 / 2 yTi 

where we want this probability to be smaller than e. Thus, we set q such that 

1 \ 


, /p(1-f)^-i 
q=p+\ - Q 


e — 


2sjnp{l-pf 


(178) 

(179) 

(180) 
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By Lemma |8] we can cover Tq with Mi number of VnDi-balls where 


- log Ml <h{q) - h{Di) + 
n n 


<h{p) + {q- p)h'{p) - h{Di) + ki 

<h{p) - 


logn 


p(l -p) „_i 


Q"M e- 


n 

1 


=h{p) - h{Di) + Q M e - 


2-\/np(l — p)^ ) P 
1 




( 181 ) 

(182) 

(183) 


ki 


logn 


,_ , , . (184) 

2i/np{l — p)^ J n 

Using Taylor’s expansion, one can bound Q~^ — 1/(2 Y^np(l — by Q~^{e) + 0 {l/\/n). Finally, we have 

(185) 


- log Ml <hip) - hiDi) + (e) + fcii^ +o(- 

n \/n n \n 


where O (1/n) term does not depend on L or Di. 

F. Proof of Lemma [76] 

For any x" G Tp, G T[Wi]ix^) and £2 G 7 ( 1 ^ 2 ]^i), we have 
(7i(a:”,Xi) =y^ P(cc)[lUi](^i|a:)di(a:,fi) 

X,Xi 

<y^ P(x)lUi(xi|a:)(7i(x,ii) + - ITFI • 
^' n 

d-M 


A’l 


d-M 


<Dl + -\X\ 
n 


=Dp 




Similarly, we have 

d2{x^,X2)= ^ P(a:)[lUi](xi|a;)[W2](i2|a:,ii)c72(a;,®2) 

X,Xi,X2 

< 'Y' P(a;)[W'i](xi|a:)W2(x2|a;,^i)rf2(a:,Xi) + - ITLI • 

A—^ r7 






C?M 


(186) 

(187) 

(188) 

(189) 

(190) 

(191) 


< P{x)Wl{xl\x)W2{X2\x,Xl)d2{x,Xl) + —W2ix2\x,Xi)d2ix,Xi) 


i|A’| 

n 






2m 


<792 H— |7F| ■ Xi dm H— ■ 

n n 


A”! 


A”, 


dm 


= 792 . 


(192) 

(193) 

(194) 


G. Proof of Lemma \T7\ 
Let Q be 


Q{xi) = P(a;)lUi(ii|a;). 


x<^X 


(195) 
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Therefore, we have 


\Q{xi) - [Q](ii)| = 


P{x){Wi{xi\x) - [Wi]{xi\x)) 


x^X 




x€X 


<y W 

n n 


(196) 

(197) 

(198) 


x^X 


which implies HQ ~ [Q]lli ^ I'^l ’ 1"^- By ifT^ Lemma 2.7], we can bound the difference between entropies: 


|i7(li)-77([li])| <- 


lA-l 




log 


IT”! 


<- 


lA-l 




logn. 


Using T^x) = —xloga:, we can also bound the difference between conditional entropies: 


|i7(li|X)-i7([Xi]|X)|<y]P(x) 


x^X 


^ r(lUi(:ri|x))-r([lUi](xi|x)) 
xiGA'i 

<^P{x) t(|1Ui(xi|x) - [lUi](xi|x)|) 

1 


iiG.Vl 


nP{x) 


Lfi 

< Y -log(nP(x)) 


xGX 


<- 


IT”! 


-Ti 


logn. 


(199) 

( 200 ) 

( 201 ) 

( 202 ) 

(203) 

(204) 

(205) 


This is because nP{x) > 3 for all x. Equation (12041) is because |t(x) — r(j/)| < t{\x — y\) if \x — y\ < 1/2. 
Finally, we get 


/(X;li)-/(X;[Xi]) < P(li) - P([li]) + P(li|X)-P([Xi]|X) 


<- 


1^1- 


A”! 


logn 


IT”!. 




log n 


(206) 

(207) 


<- 


2|T’|. 


A”! 


logn. 


(208) 


n 

Similarly, we can bound the difference between I{X] Xi, X 2 ) and I{X; [Xi], [X 2 ]). Recall that {X, Xi, X 2 ) has 
a joint law P x VUi x IU 2 and {X, [^ 1 ], [-^ 2 ]) has a joint law P x [lUi] x [IU 2 ]. 

Let Q and [Q] be 


Q{xi,X 2 ) = Y l^i(^i|a;)W 2 (x 2 |a;, xi)P{x) 


(209) 
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[Q]{xi,X 2) = 'y'jWi]{xi\x)[W2]{x2\x, Xi)P{x). 

X 

Then, Q and [Q] should be similar: 

Q{xi,X2) - [Q]{Xi,X 2) \ Wi{xi\x)W2ix2\x, Xi) - [Wi]{xi\x)[W2]ix2\x,Xi)\ 

X 

<^P(a;) \Wi(xi\x)W2ix2\x,Xi) - [Wi](xi\x)W2ix2\x, Xi)\ 

X 

+ ^P{x) \ [Wi]iXi\x)W 2 iX 2 \x, Xi) - [Wi]ixi\x)[W 2 ]iX 2 \x, Xi)\ 

X 

<'^-W2{x2\x,Xi) 

X 

2|A’|, 

jn. By lfT9l Lemma 2.7], we can bound the difference 


1 

n 


<- 


which implies 
entropies 


Q-[Q] ^ < 2 IT”! • 


A”! 




H{X,,X2)-Hi[X,],[X2]) 


2 |T’|. 



^”2 

n 


<- 


2|A’| 


^r 




log 


logn. 


2|T’| 


Note that 


\Wi(Xi\x)W2iX2\x,Xi) - [Wi](xi\x)[W2]ix2\x, Xi)\ 

< \Wi{xi\x)W2ix2\x,Xi) - \Wi]{xi\x)W2{x2\x,Xi)\ 

+ \[Wi]iXi\x)W2iX2\x,Xi) - [Wi]iXi\x)[W2]iX2\x, Xi)\ 


< . W2{x2\x,Xi) P 


< 


nP{x) 
2 


P[x) 


nP{x) 

Since we assumed that nP{x) > 3, we have 

HiXi,X2\X)-Hi[Xi],[X2]\X) 


<j2p{a 


'r{Wi{xi\x)W2{x2\x,Xi)) - t{[Wi]{xi\x)[W2]{x2\x, Xi)) 


Xi,X2 


r{\Wi{xi\x)W 2 {x 2 \x, Xi) - [Wi]{xi\x)[W 2 ]{X 2 \x,Xi)\) 


< I] 


X Xi,X2 
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nP{x) 
nP{x 


log 
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(214) 

between 

(215) 

(216) 

(217) 

(218) 

(219) 

( 220 ) 
( 221 ) 
( 222 ) 

(223) 








































32 


< 


2|A’| 




A”, 


logn. 


(224) 


Using (I 2 I 6 I 1 and (12241) . we can bound the gap between mutual informations: 

I{X-X^,X2) - I{X-[X^],[X2]) < H{X^,X2) - H{\XMX2]) + H{X^,X2\X) - H{[X^],[X2]\X) 


<- 


2\X\ 






<- 


41-^1 






logn H 
logn. 


2|A’| 






logn 


(225) 

(226) 
(227) 


H. Proof of Lemma [7S] 

We know that D\ = Di — \X\ - \ Xi\dMl'n. Using the convexity and monotonicity properties of the rate-distortion 
function, we hnd an upper bound on the difference between R{P,D\) and Ri{P,Di): 




D\-Di 

Therefore, we can bound Ri{P,D\) using Ri{P,Di)\ 

Ri{P,D\)<R^{P,Di)x\X\- 


< 


Di 

iog|>y| 

D ' 




djvr 


logi^yi 

nD^ 


for large enough n. Similarly, by the mean value theorem, there exists a c such that for large enough n, 

P(P, Dl,Dl) - R{P, D,, D 2 ) < {VR{P, (Pi - P*, P 2 - P^)) 

^ logn 


(228) 

(229) 

(230) 

(231) 

(232) 

(233) 


where P( = cPi + (1 — c)Pi, D'^ = CP 2 + (1 — c)P 2 . 
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